Genome Modeling System: A Knowledge Management Platform for Genomics
نویسندگان
چکیده
In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.
منابع مشابه
Sputnik: a database platform for comparative plant genomics
Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST col...
متن کاملSharable DBMS for Genome Informatics
The primary aim of this project is: To produce and disseminate a freely sharable, domain-speciic database management system (DBMS) suitable for use as a component of a genome informatics system. Over the past three-and-a-half years we have developed and operated genome informatics systems for the genetic-and physical-mapping projects carried out at the Whitehead In-stitute/MIT Center for Genome...
متن کاملData Management for High-Throughput Genomics
Today's sequencing technology allows sequencing an individual genome within a few weeks for a fraction of the costs of the original Human Genome project. Genomics labs are faced with dozens of TB of data per week that have to be automatically processed and made available to scientists for further analysis. This paper explores the potential and the limitations of using relational database system...
متن کاملPlatform-based product design and development: A knowledge-intensive support approach
This paper presents a knowledge-intensive support paradigm for platform-based product family design and development. The fundamental issues underlying the product family design and development, including product platform and product family modeling, product family generation and evolution, and product family evaluation for customization, are discussed. A module-based integrated design scheme is...
متن کاملژنومیکس انگل ها
Genes carry instructions to make protein that affect body's cells and their physical activity. They also play an important role in the occurrence of various characteristics in the body. Recently, scientists in the new field of science known as genomics have studied the genetic instructions. Genomics deals with the discovery of all the sequences in the entire genome of organisms and is used to s...
متن کامل